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Porphyromonas gingivalis nucleotides 

FIELD OF THE INVENTION 

5 The present invention relates to P. gingivalis nucleotide sequences, 

P. gingivalis polypeptides and probes for detection of P. gingivalis, 

BACKGROUND OF THE INVENTION 

10 Periodontal diseases are bacterial-associated inflammatory diseases 

of the supporting tissues of the teeth and range from the relatively mild form 
of gingivitis, the non-specific, reversible inflammation of gingival tissue to 
the more aggressive forms of periodontitis which are characterised by the 
destruction of the tooth's supporting structures. Periodontitis is associated 

15 with a subgingival infection of a consortium of specific Gram-negative 
bacteria that leads to the destruction of the periodontium and is a major 
public health problem. One bacterium that has attracted considerable 
interest is P. gingivalis as the recovery of this microorganism from adult 
periodontitis lesions can be up to 50% of the subgingival anaerobically 

20 cultivable flora, whereas P. gingivalis is rarely recovered, and then in low 
numbers, from healthy sites. A proportional increase in the level of 
P. gingivalis in subgingival plaque has been associated with an increased 
severity of periodontitis and eradication of the microorganism from the 
cultivable subgingival microbial population is accompanied by resolution of 

25 the disease. The progression of periodontitis lesions in non-human primates 
has been demonstrated with the subgingival implantation of P. gingivalis. 
These findings in both animals and humans suggest a major role for P. 
gingivalis in the development of adult periodontitis. 

P. gingivalis is a black-pigmented, anaerobic, asaccharolytic, 

30 proteolytic Gram-negative rod that obtains energy from the metabolism of 
specific amino acids. The microorganism has an absolute growth 
requirement for iron, preferentially in the form of haeme or its Fe(III) 
oxidation product haemin and when grown under conditions of excess 
haemin is highly virulent in experimental animals. A number of virulence 

35 factors have been implicated in the pathogenicity of P. gingivalis including 
the capsule, adhesins, cytotoxins and extracellular hydrolytic enzymes. In 
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particular, proteases have received a great deal of attention for their ability 
to degrade a broad range of host proteins including structural proteins and 
others involved in defence. The proteins that have been shown to be 
substrates for P. gingivalis proteolytic activity include collagen types I and 
5 IV, fibronectin, fibrinogen, laminin, complement and plasma clotting 
cascade proteins, a t -antitrypsin, a 2 -macroglobulin, antichymotrypsin, 
antithrombin HI, antiplasmin, cystatin C, IgG and IgA. The major proteolytic 
activities associated with this organism have been defined by substrate 
specificity and are "trypsin-like", that is cleavage on the carboxyl side of 

10 arginyl and lysyl residues and collagenolytic although other minor activities 
have been reported. 

P. gingivalis trypsin-like proteolytic activity has been shown to 
degrade complement, generating biologically active C5a, impair the 
phagocytic and other functions of neutrophils by modifying surface 

15 receptors, and abrogate the clotting potential of fibrinogen prolonging 

plasma clotting time. The trypsin-like proteolytic activity of P. gingivalis 
also generates Fc fragments from human IgGl stimulating the release of pro- 
inflammatory cytokines from mononuclear cells and is associated with 
vascular disruption and enhanced vascular permeation through the 

20 activation of the kallikrein-kinin cascade. P. gingivalis spontaneous mutants 
with reduced trypsin-like activity as well as wild-type cells treated with the 
trypsin-like protease inhibitor N-p-tosyl-L-lysine chloromethyl ketone are 
avirulent in animal models. Further, it has been shown that P. gingivalis 
grown under controlled, haemin-excess conditions expressed more trypsin- 

25 like and less collagenolytic activity and were more virulent in mice relative 
to cells grown under haemin-limited but otherwise identical conditions. The 
increased expression of the trypsin-like activity by the more virulent 
P. gingivalis has led to the speculation that the trypsin-like proteolytic 
activity may be the major determinant for infection or disease. 

30 There has been considerable endeavour to purify and characterise 

the trypsin-like proteases of P. gingivalis from cell-free culture fluids. Chen 
et al, (1992) [J Biol Chem 267:18896-18901] have purified and characterised a 
50 kDa arginine-specific, thiol protease from the culture fluid of P. gingivalis 
H66 designated Arg-gingipain. A similar arginine-specific thiol protease has 

35 been disclosed in JP 07135973 and the amino acid sequence disclosed in 
WO 9507286 and in Kirszbaum et al, 1995 [Biochem Biophys Res Comm 
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207:424-431]. Pike et al (1994) [J Biol Chem 269:406-411] have characterised 
a 60 kDa lysine-specific cysteine proteinase from the culture fluid of 
P. gingivalis H66 designated Lys-gingipain and the partial gene sequence for 
this enzyme was disclosed in WO 9511298 and fully disclosed in 
5 WO 9617936. 

In order to develop an efficacious and safe vaccine to prevent 
P. gingivalis colonisation it is necessary to identify and produce antigens that 
are involved in virulence that have utility as immunogens to generate 
neutralising antibodies. Whilst it is possible to attempt to isolate antigens 

10 directly from cultures of P. gingivalis this is often difficult. For example as 
mentioned above, P. gingivalis is a strict anaerobe and can be difficult to 
isolate and grow. It is also known that, for a number of organisms, when 
cultured in vitro that many virulence genes are down regulated and the 
encoded proteins are no longer expressed. If conventional chemistry 

15 techniques were applied to purify vaccine candidates potentially important * 
(protective) molecules may not be identified. With DNA sequencing, as the 
gene is present (but not transcribed) even when the organism is grown in 
vitro it can be identified, cloned and produced as a recombinant DNA 
protein. Similarly, a protective antigen or therapeutic target may be 

20 transiently expressed by the organism in vitro or produced in low levels 
making the identification of these molecules extremely difficult by 
conventional methods. 

With serological identification of therapeutic targets one is limited to 
those responses which are detectable using standard methods such as 

25 Western Blotting or ELISA. The limitation here is the both the level of 

response that is generated by the animal or human and determining whether 
this response is protective, damaging or irrelevant. No such limitation is 
present with a sequencing approach to the identification of potential 
therapeutic or prophylactic targets. 

30 It is also well known that P. gingivalis produces a range of broadly 

active proteases (University of Melbourne International Patent Application 
No PCT /AU 96/00673, US Patent Nos 5,475,097 and 5,523,390), which make 
the identification of intact proteins difficult because of their degradation by 
these proteases. 
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SUMMARY OF THE INVENTION 

The present inventors have attempted to isolate P. gingivalis 
nucleotide sequences which can be used for recombinant production of 
5 P. gingivalis polypeptides and to develop nucleotide probes specific for 
P. gingivalis. The DNA sequences listed below have been selected from a 
large number of P. gingivalis sequences according to their indicative 
potential as vaccine candidates. This intuitive step involved comparison of 
the deduced protein sequence from the P. gingivalis DNA sequences to the 

10 known protein sequence databases. Some of the characteristics used to 

select useful vaccine candidates include; the expected cellular location, such 
as outer membrane proteins or secreted proteins, particular functional 
activities of similar proteins such as those with an enzymatic or proteolytic 
activity, proteins involved in essential metabolic pathways that when 

15 inactivated or blocked may be deleterious or lethal to the organism, proteins - 
that might be expected to play a role in the pathogenesis of the organism 
eg. red cell lysis, cell agglutination or cell receptors and proteins which are 
paralogues to proteins with proven vaccine efficacy. DNA sequences that 
were considered to be poor vaccine candidates and not selected include 

20 those that code for proteins involved in replication, non-essential proteins 
involved in cellular processes and those proteins present at sites that would 
be unlikely to be affected by immune mediators such as those found in the 
bacterial cytoplasm or inner membranes. 

In a first aspect the present invention consists in an isolated 

25 P. gingivalis nucleotide sequence, the nucleotide sequence consisting of or 
including a sequence selected from the group consisting of SEQ ID NO: 1, 
SEQ ID NO: 2, SEQ ID NO: 3, SEQ ID NO: 4, SEQ ID NO: 5, SEQ ID NO: 6, 
SEQ ID NO: 7, SEQ ID NO: 9, SEQ ID NO: 10, SEQ ID NO: 11, SEQ ID 
NO: 12, SEQ ID NO: 13, SEQ ID NO: 14, SEQ ID NO: 15, SEQ ID NO: 16, 

30 SEQ ID NO: 17, SEQ ID NO: 18, SEQ ID NO: 19, SEQ ID NO: 20, SEQ ID 
NO: 21, SEQ ID NO: 22, SEQ ID NO: 23, SEQ ID NO: 24, SEQ ID NO: 25, 
SEQ ID NO: 26, SEQ ID NO: 27, SEQ ID NO: 28, SEQ ID NO: 29, and SEQ ID 
NO: 30. 

In a second aspect the present invention consists in an isolated 
35 P. gingivalis polypeptide, the polypeptide being at least partially encoded by 
a nucleotide consisting of or including a sequence selected from the group 
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consisting of SEQ ID NO: 1, SEQ ID NO: 2, SEQ ID NO: 3, SEQ ID NO: 4, 
SEQ ID NO: 5, SEQ ID NO: 6, SEQ ID NO: 7, SEQ ID NO: 9, SEQ ID NO: 10, 
SEQ ID NO: 11, SEQ ID NO: 12, SEQ ID NO: 13, SEQ ID NO: 14, SEQ ID 
NO: 15, SEQ ID NO: 16, SEQ ID NO: 17, SEQ ID NO: 18, SEQ ID NO: 19, 
5 SEQ ID NO: 20, SEQ ID NO: 21, SEQ ID NO: 22, SEQ ID NO: 23, SEQ ID 
NO: 24, SEQ ID NO: 25, SEQ ID NO: 26, SEQ ID NO: 27, SEQ ID NO: 28, 
SEQ ID NO: 29, and SEQ ID NO: 30. 

In a third aspect the present invention consists in a nucleotide probe 
specific for P. gingivalis, the probe including a detectable label and a 

10 nucleotide sequence of at least 15(?) nucleotides, the nucleotide sequence 
being derived from a sequence selected from the group consisting of SEQ ID 
NO: 1, SEQ ID NO: 2, SEQ ID NO: 3, SEQ ID NO: 4, SEQ ID NO: 5, SEQ ID 
NO: 6, SEQ ID NO: 7, SEQ ID NO: 9, SEQ ID NO: 10, SEQ ID NO: 11, SEQ ID 
NO: 12, SEQ ID NO: 13, SEQ ID NO: 14, SEQ ID NO: 15, SEQ ID NO: 16, 

15 SEQ ID NO: 17, SEQ ID NO: 18, SEQ ID NO: 19, SEQ ID NO: 20, SEQ ID 
NO: 21, SEQ ID NO: 22, SEQ ID NO: 23, SEQ ID NO: 24, SEQ ID NO: 25, 
SEQ ID NO: 26, SEQ ID NO: 27, SEQ ID NO: 28, SEQ ID NO: 29, and SEQ ID 
NO: 30, or a sequence complementary thereto. 

20 DETAILED DESCRIPTION 

Preparation of the P. gingivalis library for sequencing. 

To determine the DNA sequence of P. gingivalis genomic DNA was isolated 
25 from P. gingivalis strain W50 (ATCC 53978) essentially by the method 

described by Mamur J. (1961). Cloning of DNA fragments was performed 
essentially as described by Fleischmann et a/., (1995). Briefly, purified 
genomic DNA from P. gingivalis was nebulized to fragment the DNA and 
was treated with Bal31 nuclease to create blunt ends then run twice on 
30 preparative 1% agarose gels. DNA fragments of 1.6-2.0 kb were excised from 
the gel and the DNA recovered. This DNA was then ligated to the vector 
pUCl8 (Smal digested and dephosphorylated; Pharmacia) and 
electrophoresed on a 1% agarose preparative gel. The fragment comprising 
linear vector plus one insert was excised, purified and this process repeated 
35 to reduce any vector without insert contamination. The recovered vector 
plus insert DNA was blunt-ended with T4 DNA polymerase, then a final 
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ligation to produce circular DNA was performed. Aliquots of Epicurian Coli 
Electroporation-Competent Cells (Stratagene) were transformed with the 
library DNA and plated out on SOB agar antibiotic diffusion plates 
containing X-gal and incubated at 37°C overnight. Colonies with inserts 
5 appeared white and those without inserts (vector alone) appeared blue. 
Plates were stored at 4°C until the white clones were picked and expanded 
for the extraction of plasmid DNA for sequencing. 

DNA sequencing 

10 Plasmid DNA was prepared by picking bacterial colonies into 1.5ml 

of LB, TB or SOB broth supplemented with 50-100ug/ml Ampicillin in 96 
deep well plates. Plasmid DNA was isolated using the QIAprep Spin or 
QIAprep 96 Turbo miniprep kits (QIAGEN GmbH, Germany). DNA was 
eluted into a 96 well gridded array and stored at -20C. 

15 Sequencing reactions were performed using ABI PRISM Dye 

Terminator and ABI PRISM BIGDye Terminator Cycle Sequencing Ready 
Reaction kits with AmpliTaq DNA polymerase FS (PE Applied Biosystems, 
Foster City, CA) using the M13 Universal forward and reverse sequencing 
primers. Sequence reactions were conducted on either a Perkin-Elmer 

20 GeneAmp 9700 (PE Applied Biosystems) or Hybaid PCR Express (Hybaid, 
UK) thermal cyclers. Sequencing reactions were analysed on ABI PRISM 
377 DNA sequencers (PE Applied Biosystems). 

The sequences obtained are set out below. 

25 DNA sequence analysis 

Raw trace data files from the ABI 377 sequencer were manually 
trimmed using Staden Pregap(Laboratory of Molecular Biology, Medical 
Research Council, UK) running on a Sun Microsystem computer. Trimmed 
files were assembled into contigs using Staden Gap v4.1 and exported as a 

30 consensus file in FastA format. This consensus was converted into GCG 
format files and analysed for homology using the BLASTX algorithm 
[Altschul et al] on a non-redundant protein database compiled by ANGIS 
(Australian Genomic Information Service, University of Sydney). Individual 
BLAST search results were examined for significant homology by statistical 

35 probability and amino acid alignments. 

The results are set out in Table 1. 
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It will be appreciated by persons skilled in the art that numerous 
variations and/or modifications may be made to the invention as shown in 
the specific embodiments without departing from the spirit or scope of the 
invention as broadly described. The present embodiments are, therefore, to 
5 be considered in all respects as illustrative and not restrictive. 

Dated this tenth day of December 1997 

CSL LIMITED 

Patent Attorneys for the Applicant: 
F.B. RICE & CO. 
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SEQUENCES 

Seq ID # 1 
Length: 389 

1 ttcgtccaca tcgtcgcctt cggggatcac cgtgatctgg tcacaccggg 

51 ctgacaggaa aggctccttc tcgcagagct tcttaccggt cagcacattg 

101 ccatcgatca cacgctcgtg agattccttt attgtcagtc ggccggggaa 

151 ggaagcaaag acattgcaac ccggcataat acggacgtat ccgtgagctg 

201 cagcgtctga gccggtcatg gcaatcattc tggtaaaatc ggctttgccc 

251 gtaagcagga aacgtccgat cacgatcagg tcggtagcct tgagcgtcca 

301 caccgtttcg ccccgattga ttggcttccg tatgattgat cagcacgccc 

351 acttttacct gccggatgaa gtcccgtgtt acttcctac 



Seq ID # 2 
Length: 912 



1 


aacgattgtc 


51 


cagttgattc 


101 


cttgctccct 


151 


tgcacgatgc 


201 


caggacgcga 


251 


attgtcaact 


301 


tgattcttgc 


351 


ctcccatcaa 


401 


tttcattgaa 


451 


tacttgaggt 


501 


tcaatgcttg 


551 


ggtctcaagc 


601 


atatggcaat 


651 


ctgattatat 


701 


gatttcgctt 


751 


ccgccaccgg 


801 


agggggacta 


851 


gatacccagc 


901 


cctctgcctc 



ggctgattct 
ttgctccctg 
gcatgatgca 
aggacgcgat 
ttgtcagttg 
gattcttgct 
ttcctgcacc 
tgcgctaact 
gtcttttgcc 
ttgcagagag 
tgtctgtctt 
tgtaaaaccg 
cgaacttcct 
ggggttagtc 
tcgtttaagc 
ctgtgtccaa 
atcgccaagt 
atcgtgcgag 

gg 



tgcttcctgc 
cacgatgcag 
ggacgcgatt 
tgtcagttga 
attcttgctt 
tcctgcacga 
gatgcaggac 
atcaagctgt 
gcagagctga 
atcgcatgaa 
gatcaatatg 
gcagctgtgt 
aactgcccaa 
catcgggcgg 
cgtatttgcg 
tcgatgtccg 
ttctttgggg 
caaacttcgt 



acgatgcagg 
gacgcgattg 
gtcagctgat 
ttcttgcttc 
cctgcacgat 
tgcaggacgc 
gcgattgtca 
ttgcaactat 
ttcttaagtg 
gctctccttt 
agagggggtt 
atagaaacag 
attttaccgg 
actttctctt 
gtagagttgg 
attctgctgt 
ttatcttgcg 
cttcacatag 



acncgattgt 
tcggctgttt 
tcttgctccc 
ctgcacgatg 
gcaggacgcg 
gattgtcagc 
gctgattctg 
tttataggac 
tttttcagat 
cttcgtcaaa 
attgtgcaac 
tctttcggtg 
acagcaataa 
cgacaaaggc 
cgatcaattt 
gaagtcttgc 
ttgccttgag 
cgcagacaag 



Seq ID # 3 
Length: 408 

1 gagaagaaag ctcctgcact gaggaaagga gcgttaggct tgtgagtaat 

51 ctcggacaga cgctcattca cggctgtagt gatcacctgt ttcatatagt 

101 cttccacaag tccgaatatc gatcctcgca cttcttgagg agtggggtcg 

151 ctcttgaagc tgatggagag ctgcgtggta gtagcctcag catcggtagc 

201 aatggctacg ataggctcat cgttgtcctc taccggcgta tagatacgct 

251 ctgctggatt cacgggagca ggaacgtcct tgaagagttc tttgatcttg 

301 ttctccacat agtccacatc gatatctccc acgatcacca gaccttgcag 

351 gtcgggacga taccatttct tataatagtt gcgcagctca tcatgcttgg 
401 aagttgac 
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Seq ID # 4 

Length: 643 

1 cgtgtgagca acactttcct 

5 1 cacttgccac ggagagggtg 

101 agggcgaagg tgtggagatc 

151 gccggtgtag ccgaaggaat 

2 01 cgcgccccga ggaggcgtga 

251 aaccggatcc gaatctgatc 

301 cttatatccg ttccgttggc 

351 gatagacgga cgagccaaga 

401 agatgctgcg tttgcgcaat 

451 atgggagacc aactggtgaa 

501 tgccnaagat gagcaggcta 

551 aacctaccga tgctgctcgt 

601 ggattgaaag acaatgccat 



tggggccatg cagacccaga gcacttgtcc 
agatcatcac gaagccatgc tccaagtgta 
ggcgaagagg tgatctcatt ccacatccct 
gcaaatgtcc gtgaacggca agggaaatgc 
atggcgactt gatagtcgtg atcgccgagg 
cgcaatggca acgatctgat atacaatctg 
tataaaagga ggtagtgtgg aagtgccgac 
tccgcatcga ggcggggaca caacccggca 
aagggggttg ccagcgtaaa cggctatggc 
tgtcaatgnc tatatccccg aatcgatcga 
tcgcagcgat ggaaaactcg gacagcttca 
aggatatnga caagaaatca gagagatgct 
tgtatggtac ttgacctgag aaa 



Seq ID # 5 
Length: 311 

1 gggcggcgag ccggtttgga 

51 gtgtctactt ttgggtagga 

101 tggcggtgga aaaaattctc 

151 ctccgtcatc gcgtggctcg 

201 tcatgtggta catcgaggag 

251 gaaagcccgc tgacactggc 

301 ctatatccgc a 



atacggccgc acgccaaggc atccgtaccg 
tccgaaacgg ntgtgaacgg aaatcggccg 
ctccaccgtt ccgtttcgtg accgtgccga 
gactgcccga aaaggagcga ccgcgcttgc 
ccggatatga tcggacacag ccaaactccc 
aatggtacac cggttggaca gtgtggtcgg 



Seq ID # 6 

Length: 366 

1 gccgtgtaag cgcaataggg 

51 tcaacaaaca cctcctgaac 

101 actattgcat ttatggtagt 

151 ggagagagcg atgcaaaaga 

201 catcggcttt gaaaaacgag 

251 tcgtgcaact cttactgctc 

301 ccgcttacac cgccttcatg 

351 catccgaagc cttacg 



tcaagcgttt ggtcagggag gcttatcggc 
gatgtcctcc aagagagaca gatctatgct 
atcggatgaa cttcctgact ttcgtacagt 
gtctgatcng aattgccgga aatgtacctt 
tanatacgat gcgactgatc aaggcttttc 
cccattttct tctacaagcg gtttatatcg 
ccggtttacc ccctcatgtt cgtcctatgc 



Seq ID # 7 
Length: 482 

1 ctacttatct ataaactcga 
51 agccatgaac cgtaagaccg 
101 taccggaggc tccttcccaa 
151 atccgacatg cagccgaaca 
201 aaaagacgat atagaagccc 
251 gacctcaggg tgcagctgct 
301 atcgggcgaa acgatccttg 
351 gtgccacggg cgtaacctgt 
401 ggtatagaat agtctgngat 
451 ataaaaaaga ctatgatcct 



atcttacgaa ctgttccgca agatggtaga 
tagcgatcct aatgcgtgct cggataccgg 
gaagagctgg aacacaggcg gcaaatagaa 
acgtacggac atgagtaagt atcggacaca 
agcagaaagc acaaagggat gcggcaagca 
ccccagacac cgataagaaa cgagaataag 
tccttgcggt agtggcaaaa agttcaaaca 
aaaaagattt atgagagaat caccgactat 
tctcttttta ttttttctct ctacccgcat 
atctatgacc gg 
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Seq ID # 8 
Length: 500 

1 cggcgatggg cgatgttgcc ggaatggcct atcttgattc catgtcgaat 

51 gagaaggtct ggttcggcta cacgctgaaa gaagctcaag cccagcaaat 

101 tggtcttggc cttgacttaa aggggggtat gaacgttatc ttgaaactta 

151 acgcaagcga tctgcttcgt aacctctcta acaaaagttt ggatcccaac 

201 ttcaacaaag ctctggagaa tgctgccaag agcacggagc aatccgactt 

251 catcgatatt ttcgtgaagg aatatcgcaa gctcgatccc aacggtcgct 

301 tggccgttat cttcggttcc gggtgacctt cgcgaccaga ttaccgcaaa 

351 gtctacggat gcagacgtag ntgcgtctgc tcaaagaaaa atataatagt 

401 gctgtagaag cttcgtcaat gtgctccgtg gctcgtatcg atgctttcgg 

451 tgtggntgca cctaatttgc agcgattgga aggacagggg cgtatncttg 



Seq ID # 9 
Length: 352 

1 aagccgaaca tgcgtggtac gntgaccagc gtgtagcata gtaccggcac 
51 gtagatggtc agataggcca gccccttgcc aaagacgata cggagtgcgc 
101 cggagtagtg gctgatatcg gtaggtacga ggtcgtgaaa cttattcctc 
151 tcacgcgccg taccggccga gagtcctatg ccgagcagga gcgtctgctg 
201 tatgatcagg atgagtacgg caggtagcag gaaggaagcg aaaccgacgg 
251 tcgggttata cagtgccacg tcttcatagt cgatggggta aganatgatc 
301 tctccctgac gctcggtagc tcctacgctg cgcgctatct tgatctcttt 
351 gt 



Seq ID # 10 
Length: 516 

1 agcgatcaaa gcggcagtgg aactgaccga tcgctatgta tccgatcgtt 

51 tcttcccaga taaggcgata gatgccatgg acgaggccgg cgcgagcgtc 

101 catatcacca atgtggtggc tccgaaagaa atcgagatac tggaggccga 

151 attggcatcg gtgcgagaga acaagctctc ggccgtaaag gctcagaact 

201 acgaactggc tgcctccttc cgcgatcagg agcggcgcac tcagcagcag 

251 atagcggaag agaagaaaaa atgggaagag cagatgtcca agcaccgcga 

301 gacggtggac gagaatgtag tggcgcatgt agtgggcgtt gatgacaggc 

351 gttccggctg agcggctgag cacgggcgaa ggcgaacgtec tgcgcacgat 

401 ggcagatgat ctcaagacca aagtagtagg tcaggacaca gccatcgaaa 

451 ggatggtgca tgccatccag cgcaatcgtc tggggacttc gcaatgaaaa 

501 gacccgaacg ggtctt 



Seq ID # 11 
Length: 401 

1 ttcgaggcat tcctgcgcta tgggctcaag cctacattct tgactcctcc 
51 atccatgcag cgcgctgtcg agatgttcga ctaccgctca ggagaaaaat 
101 acgaatggaa tgcttacccc acctatgaag cctatatcag catgatggaa 
151 gagttccaaa caaagtatcc atcactttgt actacttccg tcattggcaa 
201 gtccgtaaag gatcgtaaac tgatgatttg caagctgacg tcctctgcca 
251 atacagggaa aaagcctcgc gtgctctata cttctacgat gcacggagac 
301 gaaacgaccg gatatgtggt nctgctccga ctcatagaac atctgctgtc 
351 gaactacgaa tccgatccga ggattaagaa cattctggat aaaacggaag 
401 t 
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Seq ID # 12 
Length: 553 

1 ggcttacacc ctgattttcc tgtgcacgaa aagttttctt atacgtttcc 

51 tccaagaccg gttcgccgga tccttttttc ttttttattt cgaacggact 

101 attctgccag ccatccgtat ggtagaagaa atagttggta ttgctactga 

151 acttaccgat gtttatatcg agcgaagtat tggtttgccg atcgttgnac 

201 ttcgatatgc gatgtgacgt atatgcactc agtcgattcg tattcttttt 

251 tgtgatcaca ttgattaccc cggcgatggc atcggatccg tagagcgaac 

301 tcgaagcacc tttaccagtt cgatccgttc gatctgatca ggagaantac 

351 gactcaaatc ggcctgaccg cctacatcgc cgtacacacg cttaccatcg 

401 ataaggatga ggatatactt actgctaagg ccgntcagct gcatgaaaga 

451 gcccatcaga ttggggccga agtcaaaaga cggactcagc cctgcataag 

501 gcctcggaag taggagccga gaaagaggct atgtccttag cggtaaggac 

551 ttc 



Seq ID # 13 
Length: 450 

1 cacacaataa cttgtgctgc attttcttat tgtattttca gaaaaagaag 

51 caaccccaat gacactccac tttgtgtcgt tggggttgtt gctttagaag 

101 taagtgatca gattatttct tagctgcttt gttgagcttg cgcttttgga 

151 tctcgtaggc aaggggtgta gcaacaaaga gcgtagagta tgtaccgata 

201 acgataccga gcaggatcga gaacgtgaaa ctacgcatcg tagcacctcc 

251 aaagatgaag attaccaaca taacgataaa cgtagtcaaa gacgtattta 

301 atgttcgacc caatgttgaa ttaagggcat cgttgatcac ctgatagcga 

351 tctctgttgg ggtacaattt catcgtctct cggatacggt caaatacaac 

401 cacggtgtca ttgagcgagt naccgatgat agccagaata gcagcgatga 

Seq ID # 14 
Length: 383 

1 ggaatgtgcc gaacgaacct ccgctcaaat cgacacggcc ataaggagcc 
51 agtcccaaat tatccgtacg catattgaca cttgccccaa aagctccggc 
101 accattggtg gaagtaccca cacctcgctg cacctgaagg tcttcgatgg 
151 aagaggcgaa gtcgggcata ttcacccaaa agacggactg agattcggag 
201 tcgttgaggg gtactccatt ggtagttatg ttgatgcgat tggcatcggt 
251 gccacgcacg cgaaagccgg aatatccgat acccgtaccg gcatcgctgg 
301 tggctaccac ggagggagtc agcatcagca gataggggat gncacgacca 
351 taattggact tggaaaggtc ggccttgcga acg 

Seq ID # 15 
Length: 477 

1 tcggagagag acgtttttcc ttcgaaaaga taactgccat cccccaaaac 
51 cttaaagggg agttcttcct catcgtactc gtccgtaatc tcgccgacga 
101 tctcttccca atatgtcctc cattgtgatc agtccgcaag tgccaccgaa 
151 ctcatccaca acgatggaga catgcacctt attggctctg aactcctcga 
201 gcaaatcatc tatgcgcttg ttttcgggga caaaatatgc tttacgaatc 
251 agaggatgcc agtcgaattc atcgccttta tccatgtgtg ggattagatc 
301 tttgatgtaa atcacccctt tgatattgtc ttctgacccc tctgaaacgg 
351 gaagtctgga ataacccgac gaaacaacga agtcaagcat cttacgaaat 
401 ggccagctca gatccacatc cacaatatcg atacgcggga accatggatt 
451 tcgcaggctg gcttattata ggaattg 



14 



Seq ID # 16 
Length: 4 86 

1 gctcattttt acctttcttc gtttgaaatg aaaacgactc cgtttgcagc 

51 acgagctcca taaatagatg ttgcagaagc atctttcaaa acggacatag 

101 attcaaaatc attcggattc atcgtagcca caacatccaa agaagtttgc 

151 ataccatcca cgatatacaa tggtgcagag cttgccccca acgaccctgt 

201 accatggatc tccacagaag cgacggcagt agggtcaccg gatgtagtca 

251 taacctgcat accggctacc tgaccttgga gggcatccat gatattggca 

301 acgggctttt ccgcgagctt ttcgctggac actttggcca cagaaccgga 

351 aacagtgctg agtttctgtc ccgtaccgta acccaataca actacctgct 

401 ccagaacctt agagtccgga tccagtacga tcttccatcc acatttagcg 

451 aatggcgaac tcctttggta atcatacccg ggaaat 



Seq ID # 17 
Length: 386 

1 ccgaacatct catcacacnc aatagggaag acctcagtgg catagccata 

51 gccgtagcga tggagggcat tcgcccgata ctcatcgaag cgcangcttt 

101 ggtcagctcg gccatttatg ccaatccgca gcgttcggcc acgggcttcg 

151 atattcggcg gatgaacatg ctcttagccg tactggagaa acgtgccggc 

201 ttcaagctca tacagaanga tgtgtttctg aacattgccg gaggtatcaa 

251 aatagccgat ccggctacgg atctggccgt tatctcggca gtgctggcgt 

301 ccagtctgga catcgttatc ccgccggccg tatgcatgac gggcgaagtc 

351 ggactctccg gananatacg tcccgtgagc cgcatc 



Seq ID # 18 

Length: 1013 

1 gattatgatg 

51 cacactggca 

101 tcgaggtctt 

151 nttcagaacc 

201 gcttacgacc 

251 aagaacagta 

301 caccacttct 

351 cctgacagcc 

401 actacggatc 

451 cccgaaagag 

501 gctccttaag 

551 acgactaagc 

601 tcctgcggga 

651 gtacgtgtaa 

701 cccatccgag 

751 agacgatccg 

801 ataccggatt 

851 tatagaccgt 

901 ctgtgaactt 

951 agattcagtg 

1001 cgtggtctct 



aagagacttg 
ggagctttgt 
ttacgtcacc 
ttcagcgtta 
catgggccat 
tgaaatagta 
tcaatacgaa 
ggggagtttg 
ttgggaaccg 
acaaagcact 
caaacacatc 
aaaatagaaa 
tgcgaaatcc 
gtcccgacct 
aagagcagcg 
ttatgacctc 
tgacattcta 
ttgctcaatc 
cccttttttc 
cggtcaatgt 
teg 



ggggaaatgg 
ctttcttcct 
aacegcagag 
cggattcccc 
ccgacaaaga 
ttgetcatag 
agaagagtcc 
gccggcactt 
gcatggtacg 
taaacaactg 
gaatagacag 
gactgctcca 
ctgccgggcg 
cagcatcgca 
agattcttga 
gggcagcaag 
catagatgac 
aataagaaac 
atagcccgcc 
ggtttcgctc 



tgtgcacagg 
ccatgcagcg 
acaatntgeg 
tttgccgatg 
accccgtcgg 
gagacaactt 
ggaegcaaac 
catcatgetg 
gcgggaagta 
cactcacaga 
actcacacta 
gaaagaactc 
taatagtttc 
cgtatacacc 
gagcatcaaa 
ttcgtaccca 
tctctggatt 
ggtcgctctc 
gttacttgtt 
gtttcagega 



ccgatgccga 
aacaagggga 
cgaagcaact 
aagaacattt 
ctcaaaatac 
gggegactte 
aggctctggg 
cccaatccca 
tccgccactg 
acagcagata 
tggacaacaa 
agegagatat 
ggtaacgaac 
tgagtatatt 
cacaatacaa 
actgegcaag 
atctggagaa 
tatcaagacg 
ctcccgcaaa 
tagctgtctg 
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Seq ID # 19 
Length: 445 

1 aacaactaat gtctcacaaa ttaatttaag aacagagatg aaaaaactga 

51 ttttagcgac tttgggactt atggccattg ccatgctctc atgttcaagc 

101 aacaacaagg atttggagaa caaaggggag gctactcttt tggtaacgtt 

151 tggtagctcc tataaagctc cacgcgaaac ctatgcgaag attgagaaga 

201 cttttgccgc agcttatccc gatcaaagga taagctggac atacacgtct 

251 tctattatcc gaaagaaact ggctcagcag ggtatttata tcgatgctcc 

301 ggatgaggct ttggagaaat tggctcgtct gggttataag aagatcaatg 

351 ttacagagtc ttcatgtgat tcccggccga gaatatgatg agatgatcga 

401 ctttgttcaa taattttaag gcagcacata gtgatattac tgtga 

Seq ID # 20 
Length: 48 8 

1 cggccgaagc ccagaccgat caatgtctgt tcgatcgcag catgatagtt 

51 gccctgctgc atgagagaga gggtctcgct catattcgta taatgctcta 

101 tcagtcggat atagtcatcc gattcgtagt ccgtacgtcc ggccatttca 

151 tcggacagac gccgtatctc ttcctctatt tggcgaatat cgttgaaagc 

201 ctgctcgacc tcttcgtaaa ccgtgtgtcc gtcctgcaaa cgcatcacct 

251 gcggcagata gcctatgcgg atccccttgg ggcgtgctat gtgtccggat 

301 gtcggttctt ccatgccggc aatcagcttg agcagcgtac tcttgccggc 

351 accgttcttc cctacaagag cgatacggtc gcgcctgttg atgacgaatg 

401 atacctgatc gaagagcaga cgggtgccga aatcgacagt caggttattg 

451 acggagatca tgacttcgtg ctcattcgnt tgatgatg 

Seq ID # 21 
Length: 836 

1 cgcattccgt cggatatgct catcggcaaa ctggaatcgc tcatcgcttc 

51 gtacataacc ggatcgatcg gaagagaaat agcatgaaga aggaggtgtg 

101 tcaataatca tggcgcacct ccttgcatta atatgggacg gtcgggtaca 

151 ccggctattc cgagactcct taaggagtcc ctaccgagac ccttaaggag 

201 tctccaccaa gacccctaag gagtctccac cgagatccct aaggagtccc 

251 taccaagacc cctaaggggt cccaacagag actccttagg ggttcctcaa 

301 tgctttactt caggaggggt tcgtgcggtc ttataatcca ttcgaatgga 

351 gacatcggga gcagtgaccg gcgaaaggaa gccgaagctt agcgaatctt 

401 accgtcgaac agattgatga tgcggccggc actacgtgca tcgtgctcgg 

451 agtgcgtcac catgacgatg gttgcacctt cgcgattgag acctctgagc 

501 agttccatga catcggctcc gtttttggag tcgaggttac ccgtgggttc 

551 atcggcgagg atgagcttcg gattggccac cacggcncgg gcgatagcca 

601 cgcgctgctg ttgtcctccg gagagctgat tggggaagtg gccggcccgg 

651 tggctgatgc tcatcttgcg cagtgcctcc tccactcgct ctttccgctc 

701 ggaagccttc acacccagat ngacgagcgg caactccacg ttctcgctta 

751 ccgtcatctc ttcgatgang ttgaagctct ggaatacgaa gccgatattg 

801 cccttacgga cgcagtcctg tctttttccc ggaagt 



Seq ID # 22 
Length: 365 

1 cggcaaagag atattgaaag gaatcaatct ggagatcaat gccggagaga 

51 ttcatgctat catggggccg aacggatcgg ggaaaagtac gctctcttcc 

101 gttttggtgg gacatccctc ctttgaagtc acggatggag aggtgacatt 

151 caatggaatc gacctgctcg aactcgaacc ggaagaacgt gcacacctcg 

201 gactctttct cagtttccaa tatccggtcg agatcccggg cgtcagcatg 

251 gtgaatttca tgagggcagc tgtcaatgaa cataggaaag cgatcggagc 

301 agaacccgta tnggcaagcg acttcctcaa gatgatgcga gagaagcgtg 

351 ccattgtgga gctgg 
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Seq ID # 23 
Length: 640 

1 ccactttaac tataaagcct ctatactttt atagtataaa gcctgcgagc 

51 tttatagtcg gaagtattaa agggatgatt gtcgtgctac acttgtcaag 

101 aaaaaggatc agaacggata gcctactgca atgcgccaag cgaaattgga 

151 agaaaggttt gggcgtgtga tagcccattt gtaacgccct gtctgctgag 

201 gatcgtaggc tttcagtccg gcatccagcc gcacaaggaa ataatcgaag 

251 tcgagacgaa gccccagacc gtaggccaaa gctatttcct tgtagaagcg 

301 atcgaaacga aagagaccgt cctcctgatt ctcatactcc tttatcgtcc 

351 agacattgcc ggcatcgaca aaagctgctg cgcgaaactt ccagaacagc 

401 tttgtcctgt attcgacatt cagatccaga cgaatatcac ccatctgatc 

451 gaagaaggtc ttgtccggag tcatcttcat actccccggg ccgagggtac 

501 ggacactcca gccgcgaacg ctgttcgatc ctccggcaaa gtaacgtaac 

551 taaagggtat atggcgagca ttgccataag ggaaagccag tccgaaaccc 

601 agattgcagt gccaaagtat tggccttttc gagagaacgg 



Seq ID # 24 
Length: 771 

1 ccaggacaat gcaaattatt tccatcgtct gcgagaaatt acccttgaaa 

51 tcagcaacac gaagttggtg ccggcctctc aacttccaaa gtattggaat 

101 ctgaacaaag aatctctgct tgctctgatc gaagaatcct tatacggcat 

151 ccatggtaca gtgacttccg ctgcgaacgg acagcctctc aaatgccaga 

201 tcttgataga aaaccatgac aagcgcaact ccgatgttta ctccgatgct 

251 accacaggct actacgtacg tcctatcaaa gccggcactt atacggtgaa 

301 atacaaagcc gagggttatc ctgaggcaac tccgnaccat taccgatcaa 

351 ggacaaagaa accgtcatca tggacatttg cattgggcaa cttcggttcc 

401 tctgcctgta cccgatttca cagcttctcc tatgaccatc tcagtaggcg 

451 aaaggcgtcc aattccaagg atcaaacgac aaataacccc acgaattggg 

501 agtggacgtt cgaaggcgga cagcctgcca tgagtacaga gcagaatccg 

551 ctcgtatcct atagtcatcc cggtcagtac gacgttacgc tcaaagtgtg 

601 gaatgcaagt ggttccaaca cgattacgaa agaaaaattc atcactgtca 

651 atgccgatat gcctgtagct gaattcgtcg gtaccccgac ggaaatagaa 

701 gagggccaga cggnatcttt ccaaaaccaa tccaccaatg ccaccaacta 

751 cgtatggata ttcgatggcg g 



Seq ID # 25 
Length: 521 

1 gcattgattg taaacagctt cttcacattg ggcgtatngg cttctttnca 

51 tgccgtgctg accctctcgg gtatngcagg tttggtgctg acgctgggta 

101 tggctgtgga tgccaacgta cttatcttcg agcgtatcaa agaagagctt 

151 cgtgccggta agactccgat tcgtgccgtt acggatggtt atggcaacgc 

201 tttctctgcc atcttcgact cgaacgttac gactattatt accggtatca 

251 tcctattcct ctacgggacg gggccgattc gcggttttgc cactacgttg 

301 attatcggtc ttatcgcttc tttcattacg gctgtcttct tgactcgtat 

351 cgtcttcgag aaactggcga aaaaaggtcg tttggataag attacattca 

401 ctacgagcat tactcgcaat ctccttgtca atccctcata caacatcttg 

451 ggtaagcgca agaccggctt tatcattccc ggtgattatc atcgtttggg 

501 acttatagct tcatttacaa t 
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Seq ID # 26 
Length: 594 

1 cgactcccga tgttccgata atagagatgc cgttccgtaa gccgagtcgg 

51 ggattgaatg tctgggtaca gcctctcggc cttcgggtac gctaatcgta 

101 atatccacac ctccctgcgc atagagtcgg cgtacctctg ctgtcatcat 

151 tcgtcgaggt acgaggttga tagccggacc tccgacctcc agaccgaggc 

201 cggggagcgt cactaccccc accccttcac cctgcaggaa gcggacttcc 

251 tcatgttcgg gattgagcct gatcgtagcg cataccgcca tgccattggt 

301 cacatccgga tcatcacctg catctttcag gactgcggat acgacagcat 

351 cttcttcctc tcgaatttcc gctatgggca gactgactat ttcgcccgaa 

401 ggcaattcta cgggagcttc ggcgagagag ccaagcccca tcaatcggta 

451 catggctgct actactgcag cggtagctgt ggtgccggta gtgaatccgt 

501 tcggagtgaa aagaatcccg gacgaagcgt ctaccggccg tctaaaccga 

551 caggccgcta cggaatgaag aagaagcaaa ggggacgtcc acgg 



Seq ID # 27 
Length: 587 

1 ccctgcattg ataaccactt tacgcttgtt gatagcagtt accctaccca 

51 ttgccacttg cttttcgtgg atggtattga gcgtttngtc ataggcagct 

101 tcctgctcct gacgactcac ggcagcatgg acgccaccgg cctcaaactc 

151 ttcccagttg aaatcctctc tgggctgaat gttctttaag ttttccattt 

201 gttttttaat cgtcttgttg ctaatagatt ggacattata tggtcatatc 

251 tctcaaaagc acggnaaagt tacaataact tttcgttact ctcttcattg 

301 tgatcaacct gcagattccc cccaccggtg caacattaag caagcgacct 

351 caggattgac tcccaagagt caaccgaaat caggaaacga cactatttga 

401 aattacaatg ttgcaatatc gatcttggcg taaaactgat cggaaacggt 

451 cccgatgttt cttcacaatt actgcttttt ttgacctcct caagcctcat 

501 tttttcagta cacgtcacgt cagtcgtcag tataaaaaag tgacgcgtgc 

551 ctttnctgaa aggcgcgcga gaatttcccg tttgcgc 



Seq ID # 28 
Length: 740 

1 gtatcggaag gccgcaaccg caccaaggca cagatcgaca gcatcgctca 

51 aggccgtgta tggctcggcg acaaagctct tgcactcg^t ttggtggatg 

101 agcttggagg tttggacaca gctatcaaac gggccgcgaa gctggctcag 

151 ctcggtggca actacagcat agagtatggc aagaccaagc gcaacttctt 

201 cgaagagttg ctctcctcat cagcagcgga tatgaagtct gccatcctga 

251 gtaccattct ctccgatccg gaaatagaag ttctgcgcga actccgctcc 

301 atgccgcccc gtccttcggg catacaggca cgtctcccct attacttcat 

351 gccgtactga taaatgagac aaccgtaatt gctgaagaga tggatgcgcc 

401 ccgtatcaac aagtggctca aaccgctttc cgccctctac ggggtgggcg 

451 tgaggttgcg caactacctc ttcgacaaga acgtcctgat ttcgaactct 

501 ttcgacatcc ctatcgtctg tgtaggcaat atcaccatcg gcggcaccgg 

551 taagacaccc cacgtagaat acctgattcg gctcctgcat ccacgctatc 

601 gtgtagcagt ggttagccgc ggctataagc ggaaaaccaa agggatgatc 

651 gttgcaaccg aaggatcgac tgcatgggat ataggagacg aacctcgtca 

701 gatcaaacga aaatatccgg acctgaccgt catcgtggat 
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Seq ID # 29 
Length: 613 

1 ctcccgattc gcccataagg tgagcctgca caacagttgc cacggtgtgc 
51 gcgaactgca tctgtccacc cccagtgaag tgcaccgacc gtaccacaac 
101 aaggtgcgcc ggctattgga gatggtgcag ggcatagagg tattcgagcc 
151 gaagcgaata gacgaatgct gcggtttcgg cggtatgtac tcggtggagg 
201 agccggaggt atccacctgt atggggcatg acaaggtgct ggatcacata 
251 tccacaggtg cggagtacat cacagggccg gacagctcgt gcctcatgca 
301 tatgcaggga gtgatagaca gagagaaatt gcccgatcaa gacaattcat 
351 gcagtagaaa ttttagcagc aaacttattg agtacgaagc atagcgaagc 
401 ggctgcccgc tttttggaga ataagtccgg agcccaagtg gcatgacgag 
451 acgctctgga atggtgcgcc acaaacgcga catccagcgt gatacggtgc 
501 cccgagtggg gaaagatctg cgccaactgg gctcatgaaa tcaaacgctt 
551 caatgtgaca cacttggatt ganctgctgc tgcgatttga agaaatgctt 
601 cgtcgaaccg gtg 



Seq ID # 30 
Length: 560 

1 tgggtatagc cagagctttg ctggcgaagc ctgcgttgat cctggccgac 

51 gaacccacag gcaacctcga ttcggtgacc ggattgcaga tcgcttctct 

101 gctctacgaa atcagtaagc agggcactgc agtacttatg agcacgcaca 

151 acagcagcct gctgtcgcat ctgccggcac ggacattggc cgttcgtaag 

201 aatggcgatg cctcctcttt ggtcgagctt gagtgcagat gctgtttcaa 

251 gaaaaaatac ggaaatagat tagcacgata agatcaggaa ttgaaagttc 

301 tcaaatttgg cggtacgtct gtaggagatg ctgaagcgta tccgcaagtg 

351 ttgcccgact gattactttc ggtaaaagga agaaaaatta tagtcctttc 

401 ggctatggcc ggaacgacca attcgcttgt cgaaatagcc tcacaccttg 

451 tcaaacgcca atgtggcaca ggcaaagagg gtgtgccaag gtgttgcgag 

501 agaaatatca tcgcgaaata aatgctctat ccaaacgtnc ggataccttg 

551 agcgcagcca 
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